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A numerical algorithm is given for implementing a nonparametric maxi- 
mum penalized likelihood estimator similar to those proposed by Good and ‘ 
Gaskins and those proposed by de Montricher, Tapia and Thompson. It is 
shown how the resulting nonlinear constrained optimization problem may be 
; effectively solved by using Tapia's approach to Newton's method for con- 
strained problems. . 
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1. Introduction. de Montricher, Tapia and 
Thompson demonstrated that the standard 
histogram was an unstable maximum likelihood 
density estimator and considered maximum 
penalized likelihood estimators similar to 
those previously considered by Good and 
Gaskins (1971). Specifically suppose we are 
given fhe random sampie xX,,..-,%,€ (a,b) . 
Let Ho(a,b) consist of the functions f 
defined on (a,b) with the property that 
f(a)=f£(b)=0 and £' is a member of 
L‘(a,b). Estimate the density function which 
gave rice to the random sample x,,...; 

by the solution of the constrained optimiza- 
tion problem 


(1.1) max L(£);£€H} (a,b), £20 and 


i.e., a polynomial of degree two plus a spline 
of degree one. We now give a numerical algo- 
rithm for approximating this monospline. 


2. The Discrete Problem. For given n, con- 
sider the mesh t,,... stn+, Where ty=a+t ih, 
i=0,...,n+1 with h=(b-aV/@+1). Let 

at denote the vector space of all continuous 
piecewise linear functions which have knots at 
Cheseeyty and vanish at a and b. For 

p xl let yy=p(ty), i=0,...,n+1. Then 
Yo=Y¥rt1=0 and 


(2.1) p(x) >0y; 30, 
(2.2) ie (xjdx = h £ 
° p(x)dx = 
a ixo't 


L=1l,...,n 


b b 1 n . 2 
J, f@dax=1, (2.3) [pt (xyPax=t pee as 


| 


where 


vr 


(1.2) LC) =! 
i=l 


let . 


b 
ter cxy |? 
E(x, exp(-af |£'(x) | ax), (90). (9 4 vy stot x, tn (a,c, +5) 


The functional L in (1.2) is called the 


i penalized likelihood and the solution of 


i (1.1) is called the maximum penalized likeli- 
hood estimator based on the random sample 


Xp ace+.%y_- de Montricher, Tapia and Thompson 
tN ’ 
(19955 proved that problem (1.1) has a unique 


Solution and is a monospline of degree two, 
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h h 
(2.5) vy =#of xy in (ty_j+>, tit+z)s 
i=2,...,n-1 
h 
(2.6) va=#of x, in (t)_1+5,b). 


We shall assume that we have enough data so 
that v,>OVi . Our finite dimensional 


approximation to problem (I. a is 


-1 
(2.7) max Lty); yy >0Vi and eyyach” 


where a Y “1 Ly 2 
(2.8) L(y) . t-1 ct exp(-¢h i=1 Gacy) ). 


Clearly (2.7) is & constrained optimization 
problea in R®° . 


Proposition 2.1. The constraints y,20 ‘of 
problen (2.7) are not active at the solution. 


Proof, If y*=(hN)(1,...,1), then y* 
Satisfies all the constraints of problem 
(2.7) and L(y) >0. Moreover, if 

v= (ype en ) js such that y¥,=9 for 
some i, then L(y) =0. This proves the 
proposition. 


It follows that we can obtain the solu- 
tion of problem: (2.7) py solving 


(2.9) min(-log Ly)); 5 y,=h v1 
i=l 
where from (2.8) we see that 


(2.10) -log(iy)) = - Ev log(y,) 
i= 


3 2 
+ on” Eee “yy: 
1= 


3. The Algorithm. The Lagrangian for problem 


(2.9) is 
(3.1) 4,2) = - ey, log(y,) 
i= 
a 2 
toh“! E (yy) 
1=0 itl i 
a -1 
+402 yyoh « 
1=1 
The gradient of the Lagrangian is 
-1 
(3.2) Vh(y,A) = (2... 9 Vy 
+ aon" *(-y, 1+ 2y, iS Yy-p) 
+ Apesed? 


and the Hessian of the Lagrangian is the 
diagonally dominant tridiagonal matrix 


a5 st 
4,4 4 
(3.3) V2¢y,2)= SOR 
n-l 
a a 
A 
ay 4, 


p74, = 20h") and 


re “1,2 
a5 = taht + v, (yi) 


where 4 


It therefore follows that Tapia's (1974), 
(1976) approach to Newton's method for con- 
strained problems is a natural one for this 
Problem and the operation count will be 


ti 


O(n) per iteration instead of the usual 
O(n’) expected from Newton's method, 


n 3. 
let g(y) = hy oh T Then 
i=l 


U=Ve(y) =(1,..-,1)) - We use (,) to des 
note the inner product in R® , 


The Newton-like Algorithm, 


Step 1. Determine @>0, €>0, y?, nr? and set 
k:=0 , 


Step 2. Calculate ; 
LS Cy vagy aly yy lacy) - vnc, a5 “ud 


and 
- 1 
ytd an KPa cyt a beecyk att) 


Step 3. If lIvt(x*,»5 Ice, then stop. 
If not, then set k:=ktl and go to 
Step 2. 
Initialization values could be 
e=107" 
w= 5.0 


y= (oh) 1(1,...,1) 


ae 35 5 ee 8 “1 
AY = +2 (nh) Cyt yg) + Evry) . 


ta ws as 
Vom eee oa 


This 2° is given by the projection formula 
in Tapia (1976). 

For a complete description of this algo- 
rithm and related quasi-Newton methods for 
constrained optimization the reader is referred 
to Tapia (1976). 


Proposition 3.1. The above algorithm is locally 
quadratically convergent and requires only 
O(n) operations per iteration. 


4. Some Numerical Examples. Although for 


reasons of conciseness it was appropriate to 
develop the above discrete maximum penalized 
likelihood algorithm using the integral of 
the square of the first derivative in the 
penalty team, we have found by experience 
that the slightly more complicated second de- 
rivative approach gives less locally "rough" 
estimators. geen t we consider the problem 


(4.1) Max L(£); if Eu, 5 (ab), £20 and 


r  fodeel 
where b 
(4.2) L(f) = B ecx,pexpt-of | £" (x) (ax) »(@>0). 
i= 


The details of the algorithm to approximate 
the solution to this problem are omitted, 
since they are very similar to the argument 
in Section 2. The operation count is still 
O(n) per iteration. 

In Figure 1, we demonstrate the solution 
to (4,1) using a random sample of size 20 frum 
the standard normal distribution with mean 0 
and variance 1. In Figure 2, we show the 


wR 


eines ets Ste 


a 


estimator based on a sample of size 100. In 
Figures 3 and 4 we show the D.M.P.L.E.esti- 
mators for the 50-50 mixture of two normal 
distributions, both having variance 1 and 
with means at -1.5 and +1.5 for samples of 
size 25 and 100 respectively. 


One comforting feature of the maximim 
penalized likelihood procedure is the rela - 
tively robust quality of the estimator in that 
changes of the optimal @ with N= and from 
distribution to distribution tend not to be 
traumatic, and that a rough and ready guess 
for a (e.g., 10) is frequently satisfactory. 
In Figures 5 and 6 we show an estimate for 
the Gaussian mexture mentioned above for a 
sample size of 300 and q@ values of 10 
and .1. 


If the density to be eStimated is de- 
noted by f£(.) and the D.M.P.L.E. is de- 
noted by f(-°), then we consider as one 
measure of estimate quality the average inte- 
grated mean square error 


(4.3) IMSE =f (F(x) - £(x))7£(x)dx 


I.M.S.E.'s for various @ and N are given 
in Table 1 for the standard normal, the 
50-50 normal mixture mentioned above, the 

t distribution with 5 degrees of freedom 
and the F distribution with (10,10) de- 
grees of freedom. 
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TABLE 1 
Average I.M.S.E, of the D.M.P.L.E. for 
@ Perturbed by a Factor of Two. Divide 


@ by 10 for the Fi0,10 Samples. 
1.M.S.E. 
for 

Sample a=5 a=10 _a=20 
N(O.1) N=25 00242 .00267 ~—-.00427 
N(0.1) N=100 .00093  .00079 00089 
N(0,1) N=400 .00037 00033  .00035 
N(0.1) N=800 .00028 00022 00019 
Bimodal N=25 .00197 00159 .00152 
Bimodal N= 100 .00070 00054 .00171 
Bimodal N= 400 .00030  .00024 00022 
t, N=25 00297  .00282  .00350 
t, N= 100 .00092 .00084 .00101 
t, N=400 .00039  .00032 00030 
Fi0,10 N=25  .03208 .03865 .05519 
Fig,i9 N= 100 00996 .01390 02411 
F N=400 .00292 .00450 .00740 
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